Metadata-Version: 1.1 Name: polyglot Version: 15.10.3 Summary: Polyglot is a natural language pipeline that supports massive multilingual applications. Home-page: https://github.com/aboSamoor/polyglot Author: Rami Al-Rfou Author-email: rmyeid@gmail.com License: GPLv3 Description: polyglot ======== |Downloads| |Latest Version| |Build Status| |Documentation Status| .. |Downloads| image:: https://img.shields.io/pypi/dm/polyglot.svg :target: https://pypi.python.org/pypi/polyglot .. |Latest Version| image:: https://badge.fury.io/py/polyglot.svg :target: https://pypi.python.org/pypi/polyglot .. |Build Status| image:: https://travis-ci.org/aboSamoor/polyglot.png?branch=master :target: https://travis-ci.org/aboSamoor/polyglot .. |Documentation Status| image:: https://readthedocs.org/projects/polyglot/badge/?version=latest :target: https://readthedocs.org/builds/polyglot/ Polyglot is a natural language pipeline that supports massive multilingual applications. - Free software: GPLv3 license - Documentation: http://polyglot.readthedocs.org. Features ~~~~~~~~ - Tokenization (165 Languages) - Language detection (196 Languages) - Named Entity Recognition (40 Languages) - Part of Speech Tagging (16 Languages) - Sentiment Analysis (136 Languages) - Word Embeddings (137 Languages) - Morphological analysis (135 Languages) - Transliteration (69 Languages) Developer ~~~~~~~~~ - Rami Al-Rfou @ ``rmyeid gmail com`` Quick Tutorial -------------- .. code:: python import polyglot from polyglot.text import Text, Word Language Detection ~~~~~~~~~~~~~~~~~~ .. code:: python text = Text("Bonjour, Mesdames.") print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name)) .. parsed-literal:: Language Detected: Code=fr, Name=French Tokenization ~~~~~~~~~~~~ .. code:: python zen = Text("Beautiful is better than ugly. " "Explicit is better than implicit. " "Simple is better than complex.") print(zen.words) .. parsed-literal:: [u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.'] .. code:: python print(zen.sentences) .. parsed-literal:: [Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")] Part of Speech Tagging ~~~~~~~~~~~~~~~~~~~~~~ .. code:: python text = Text(u"O primeiro uso de desobediência civil em massa ocorreu em setembro de 1906.") print("{:<16}{}".format("Word", "POS Tag")+"\n"+"-"*30) for word, tag in text.pos_tags: print(u"{:<16}{:>2}".format(word, tag)) .. parsed-literal:: Word POS Tag ------------------------------ O DET primeiro ADJ uso NOUN de ADP desobediência NOUN civil ADJ em ADP massa NOUN ocorreu ADJ em ADP setembro NOUN de ADP 1906 NUM . PUNCT Named Entity Recognition ~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python text = Text(u"In Großbritannien war Gandhi mit dem westlichen Lebensstil vertraut geworden") print(text.entities) .. parsed-literal:: [I-LOC([u'Gro\xdfbritannien']), I-PER([u'Gandhi'])] Polarity ~~~~~~~~ .. code:: python print("{:<16}{}".format("Word", "Polarity")+"\n"+"-"*30) for w in zen.words[:6]: print("{:<16}{:>2}".format(w, w.polarity)) .. parsed-literal:: Word Polarity ------------------------------ Beautiful 0 is 0 better 1 than 0 ugly -1 . 0 Embeddings ~~~~~~~~~~ .. code:: python word = Word("Obama", language="en") print("Neighbors (Synonms) of {}".format(word)+"\n"+"-"*30) for w in word.neighbors: print("{:<16}".format(w)) print("\n\nThe first 10 dimensions out the {} dimensions\n".format(word.vector.shape[0])) print(word.vector[:10]) .. parsed-literal:: Neighbors (Synonms) of Obama ------------------------------ Bush Reagan Clinton Ahmadinejad Nixon Karzai McCain Biden Huckabee Lula The first 10 dimensions out the 256 dimensions [-2.57382345 1.52175975 0.51070285 1.08678675 -0.74386948 -1.18616164 2.92784619 -0.25694436 -1.40958667 -2.39675403] Morphology ~~~~~~~~~~ .. code:: python word = Text("Preprocessing is an essential step.").words[0] print(word.morphemes) .. parsed-literal:: [u'Pre', u'process', u'ing'] Transliteration ~~~~~~~~~~~~~~~ .. code:: python from polyglot.transliteration import Transliterator transliterator = Transliterator(source_lang="en", target_lang="ru") print(transliterator.transliterate(u"preprocessing")) .. parsed-literal:: препрокессинг History ------- "14.11" (2014-01-11) --------------------- * First release on PyPI. "15.5.2" (2015-05-02) --------------------- * Polyglot is feature complete. "15.10.03" (2015-10-03) --------------------------- * Change the polyglot models mirror to Stony Brook University DSL lab instead of Google cloud storage. Keywords: polyglot Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Environment :: Console Classifier: Intended Audience :: Science/Research Classifier: Intended Audience :: Education Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+) Classifier: Natural Language :: Afrikaans Classifier: Natural Language :: Arabic Classifier: Natural Language :: Bengali Classifier: Natural Language :: Bosnian Classifier: Natural Language :: Bulgarian Classifier: Natural Language :: Catalan Classifier: Natural Language :: Chinese (Simplified) Classifier: Natural Language :: Chinese (Traditional) Classifier: Natural Language :: Croatian Classifier: Natural Language :: Czech Classifier: Natural Language :: Danish Classifier: Natural Language :: Dutch Classifier: Natural Language :: English Classifier: Natural Language :: Esperanto Classifier: Natural Language :: Finnish Classifier: Natural Language :: French Classifier: Natural Language :: Galician Classifier: Natural Language :: German Classifier: Natural Language :: Greek Classifier: Natural Language :: Hebrew Classifier: Natural Language :: Hindi Classifier: Natural Language :: Hungarian Classifier: Natural Language :: Icelandic Classifier: Natural Language :: Indonesian Classifier: Natural Language :: Italian Classifier: Natural Language :: Japanese Classifier: Natural Language :: Javanese Classifier: Natural Language :: Korean Classifier: Natural Language :: Latin Classifier: Natural Language :: Latvian Classifier: Natural Language :: Macedonian Classifier: Natural Language :: Malay Classifier: Natural Language :: Marathi Classifier: Natural Language :: Norwegian Classifier: Natural Language :: Panjabi Classifier: Natural Language :: Persian Classifier: Natural Language :: Polish Classifier: Natural Language :: Portuguese Classifier: Natural Language :: Portuguese (Brazilian) Classifier: Natural Language :: Romanian Classifier: Natural Language :: Russian Classifier: Natural Language :: Serbian Classifier: Natural Language :: Slovak Classifier: Natural Language :: Slovenian Classifier: Natural Language :: Spanish Classifier: Natural Language :: Swedish Classifier: Natural Language :: Tamil Classifier: Natural Language :: Telugu Classifier: Natural Language :: Thai Classifier: Natural Language :: Turkish Classifier: Natural Language :: Ukranian Classifier: Natural Language :: Urdu Classifier: Natural Language :: Vietnamese Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence Classifier: Topic :: Text Processing :: Linguistic